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Abstract. We present rigorous results on some open questions on NSRPS, non 
sequential recursive pairs substitution method (see Grassberger in ^4 ). In particular, 
starting from the action of NSRPS on finite strings we define a corresponding natural 
action on measures and we prove that the iterated measure becomes asymptotically 
Markov. This certify the effectiveness of NSRPS as a tool for data compression and 
entropy estimation. 
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1. Introduction 

We consider here a suitable non sequential recursive pair substitution method (NSRPS) 
which has been proposed by Jimenez-Montano, Ebeling and others 0|. This method 
has been studied and precisely defined by P. Grassberger as a tool for data compression 
and entropy estimation ^j. He deduced some important properties of the method and 
used it to estimate the entropy of the written English. 

In particular the results found in ^ and the conjectures made therein are the main 
motivation for this paper. 

Data compression is one of the most interesting research fields in Information 
Theory both from the applied and from the theoretical viewpoint. In particular data 
compression algorithms provide a powerful tool for the measure of the entropy and 
more in general for the estimation of complexity of a sequence. The first algorithms 
( Shannon- Fano, Huffman, see for example PP, j3]) were based on the suitable coding 
of single characters, or of strings of a fixed and small number of characters. A great 
improvement in the field of data compression has been given by the dictionary-based 
compression methods LZ77 LZ78 and LZW [H] in which variable- length strings 
are suitably encoded. In particular in LZ78 a sequence is encoded as a list of phrases. 
Initially the phrases coincide with the characters and then any new phrase is obtained 
sequentially by adding a character to one of the existing phrases. The NSRPS method we 
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are going to study here, even if different in many respects from this dictionary methods, 
has some similarity with LZ78 and in particular with a variation of LZ78 which has 
been recently proposed 2J. 

The NSRPS method works in the following way. Let us consider a sequence 
built with the characters of a finite alphabet A = {ao, ...,am-i}- For any given i,j let 
Uij be the number of non-overlapping occurrences of the string aittj in s", and let be 
io,jo the pair (or one of the pairs) for which njj is maximum. Now let us define a new 
sequence s} obtained from s° by substituting any occurrence of the pair ai^aj^ with a 
new symbol a^- The new sequence is shorter than the previous one and its alphabet has 
one character more. Then starting from §} we define a new sequence with the same 
procedure, et cetera. We call a single step of NSRPS a "pair substitution" (the one for 
example that transforms s° into s}). 

For sake of clearness let us consider two specific examples when the initial sequence 
is binary. For first let us consider the case in which 

s° = 0010101010001001010101110101 

and we substitute 01 with the new character 2. We obtain 

s} = 02222002022221122 

As said above the sequence s} is shorter then In particular, denoting with \s\ the 
length of a generic sequence s, we have 

\s}\ = -#{01 C 

where #{01 C s"} is the number of times we find 01 in the string s°. Dividing by \s^\ 

^ _ #{01 C gO} 

We always work with sequences extracted by an ergodic measure /i. Then taking the 
limit as ^ oo we get, for almost all sequences that 

lim ^ = 1-M01). (1.1) 

Another important fact to notice is that the transformation is invertible (see Section 
12)), and then the amount of information of the two sequences is the same (see Section 
IH}. Therefore, if h{s_) is the entropy per character of s: 

0^ _ h{s}) 



The second example we consider is when the pair to be substituted is made by two 
equal characters. Let us consider the sequence 



1001100100000011001000001000010001 
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and let us substitute 00 with 2. We found the new sequence 

12112122211212201221201 . . . 

The main difference with the case considered before is the fact that in this case we do 
not substitute with 2 all the pairs of consecutive in s". For instance 1001 121, but 
10001 1201. It is easy to deduce that in this case (ll.lj) changes in 

^ = 1 - /i(00) + /i(OOO) - /i(OOOO) + /i(OOOOO) - (1.2) 

This example shows that under a NSRPS the probabilities of strings can behave in 
a complicate way. In spite of this fact, the substitution process transform a Markov 
sequence in a Markov sequence, as proved by Grassberger in jlj. 

In general, if the starting sequence is not Markov it does not becomes Markov after 
a finite number of transformations. Nevertheless it is reasonable to expect that the 
sequences tends to become Markov as the number of transformations tends to infinity. 
This is exactly what was conjectured in j3] and what we prove here. 

More precisely the main facts we prove are the following. 

In any pair substitution the conditional entropy h\ (i.e. the entropy of a character 
conditioned to the previous character), suitably normalized, does not increase. If the 
process is already Markov then it stays constant (truly, there are other rare cases in 
which hx stays constant, see Section El and Section ISJ. 

This is a general property of the pair transformations and holds true whatever is 
the substitution made. An immediate corollary of this fact is that Markov sequences 
are transformed in Markov sequences. 

As the number of transformations goes to oo and also the inverse of the average 
shortening Z diverges, the (suitably normalized) conditional entropy hx tends to the 
entropy of the sequence. In this sense we prove that in the limit the process becomes 
Markov. In particular this is the case if any time we substitute the pair of characters 
which maximizes the number of nonoverlapping occurrences. This condition is not 
strictly necessary but, as we shall see in Section not for all the sequences of 
substitutions it holds the result. 

The paper is organized as follows. 

In section 2 we will fix notations and give some preliminary results. In particular 
we will discuss how pair substitutions act on strings and give a natural definition of a 
corresponding action on ergodic measures. 

In section 3 we will state results on how pair substitutions act on entropies. 

In section 4 we prove the main result of the paper. 

In section 5 we discuss some examples. 

In section 6 we give some concluding remarks. 

In sections 7, 8 we collect technical results on measures and entropies 
transformations under the action of a pair substitution, respectively. 
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2. How pair substitutions act on strings and measures 

2.1. Strings 

Given an alphabet A we denote with A* := U^^^A^ the set of finite words in the alphabet 
A. Elements of A* are indicated with underlined lower case Latin letters w, HH, etc. The 
same notation will be used also for infinite (elements of A^) and double infinite words 
(elements of A^). An element w has length \w\ and, if \w\ = k, it is also indicated with 

-.= 101... Wk ■■= {wi, . . .,Wk). 

Let us consider x.y E A (inchiding x = y), a ^ A, and A' = AU {a}. A pair 
substitution is a map G = G"y : A* A'* which substitutes ordinatcly the occurrence 
of xy with a. More precisely Gw is defined by substituting in w the first occurrence 
from the left of xy with a, and then repeating this procedure till the end of string. 

We define also the map S — : A'* A*, which acts on the words z e A'* 
substituting any occurrence of the symbol a with the pair xy. 

Notice that the map G is injective and not surjective while the map S is surjective 
and not injective. Notice also that S\g{a*) — G~^, i.e. 

S{G{w_)) = w for any w E A*. (2.3) 

We remark that these definitions work also in the case of infinite sequences w G 
and zeA'^. 

It is easy to see that the set of admissible words G{A*) is a subset of A'* which can 
be described by constraints on consecutive symbols: in the case xy — > a, with x ^ y, 
G{A*) consists of the strings of A'* in which does not appear the pair xy; in the case 
XX — > a, ^(^4*) consists of the strings of A'* in which do not appear the pairs xx and 

xa. An important fact is that after the application of more pair substitutions, the set of 
admissible words remains described by constraints on consecutive symbols. This follows 
from the fact that a pair substitution maps pair constraints in pair constraints, as stated 
in the following theorem. 

Theorem 2.1 Let {Va,b}ab(^A ^ matrix with 0-1 valued elements (the constraint 
matrix), and let Ay be the subset of A* whose elements w verify 

\w\-l 
i=l 

( Ay is the set of admissible strings with respect to the pair constraints given by V). 
There exists a constraint matrix V with index in A' such that 

G{A*y)=A'*,. 

The proof follows from direct inspection. Here we only write V in terms of V. Let 
z,w E A\{x, y}: the values of the elements of V are given by the following tables: 
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1 



Note that these expressions hold if Vx,y 
this is a non interesting case, G{Ay] 



1 and Vx^x = 1 respectively; otherwise, and 
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Measures 

We indicate with £^(v4) the set of ergodic stationary measures on A^, the only measures 
we are interested in. If /i G S{A) we use the shorthand notation fi{w) to indicate the 
value of the |w|-marginals of fi on the sequence w_. 

The maps C^y and S^^ induce the maps Q = G^y : £{A) £{A') and S = : 
S{A') —>■ S{A) in the following natural sense. Let fi G S{A) and w G A^ be a frequency 
typical sequence with respect to /i, and let z/ G i^(A') and z G A'^ be a frequency typical 
sequence with respect to z/. The sequence Gw is typical for an ergodic measure that we 
call Qfi and the sequence Sz is typical for an ergodic measure that we call Su. 

More precisely, denoting the number of occurrences of a subword s in r with 
jj {s C r } := Xll^r'"''''^ -"-('"i"'''"'"^ ~ where I is the characteristic function, it holds 



Theorem 2.2 Let s G A'* then 

exzsis anc? is constant fi almost everywhere in w, moreover are the 

marginals of an ergodic measure on A'^. 

In analogous way, let r G A* ; then 



exists and is constant v almost everywhere in z_, moreover {5z/(r)}^g^, are the marginals 
of an ergodic measure on A?' . It holds 

s:'d:yf^ = /i. (2.6) 

In Section Q we give the proof of the theorem and of the following propositions (which 
we use for the main theorem in Section ; moreover from ()2.4|) and ()2.5p we write the 
explicit expressions of Qfi and Su in terms of fi and u respectively. 

Proposition 2.1 Let Zf^y be the inverse of the mean shortening, with respect to fi, of 
a string under the action of C^y and let W = be the mean lengthening, with respect 
to V, of a string under the action of . 

TL 1 

Ifx ^ y ■= lim „ = — - (/X a.e. inw). (2.7) 
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where is the sequence of k times x. 

' ^ = 1 + u{a) {u a.e. m z) 

Moreover 



W^:= lim ^--^ = l + u{a) {u a.e. in z). (2.9) 



^G.V ^ ^2.10) 

Proposition 2.2 Lei r G A*, the value of Sv{r) depends only on the values of v{s) 
with \s\ < \r\ 

We remark that this assertion if false for Q^, and, in the case of x = y, Gfi{s) can involve 
the probability of infinitely many strings of increasing lengths (see Eq.s (j7.33|) ). 

Proposition 2.3 (Invertibility of Sv) 

If v £{A') respects the pair constraints given by G, i.e. for z_ G A'* 

v{z)=Q if ziG{A*), 

then 

V = QSu. 

3. How pair substitutions act on the entropy per symbol 

Given /i G £{A)., n > 1, and indicating with log the base 2 logarithm function, 

Hnifi) := — X]|2|=n/^(^) logA^(^) is n-block entropy, 

hn{fi) := Hn+i{fi) — Hn{fi) is the n-conditional entropy, 

/i(/i) := lim„_+oo = lim^^+oo hnifJ-) is the entropy of /i. 

It holds: 

h{fi) < ...< hj{fi) < hj^i{fi) < ...< hiin) < Hi{n). (3.11) 

Denoting with fi{z\w) := fi{wz) / fi{w) the conditional probabilities, we say that /i is a 
/c— Markov measure if for any n > k, w & A"' and a E A, /i(a|w") = /u(a|w"_^_,_;^). In 
this case h{fi) = hj{fi) Vj > k. We remark that h{ii) = hk{fi) implies that n is a. 
/c— Markov measure. 

We collect here some results on how entropies transform under the action of Q. 
Proofs are postponed to the technical Section |H1 

We will use the shorthand Z = Zi^y, and sometimes Z^^ = Z!^y when we need to 
stress the reference measure. 

Theorem 3.1 



h{QiJ,) = Zh{fj,). 



(3.12) 



Recursive Pair Substitutions 



7 



In fact the information amount of the string w is the same of the string G{w). 
Theorem 3.2 

hiiQfi) < ZhM (3.13) 
Moreover, if fi is a 1 — Markov measure Qfi is a 1 — Markov measure. 

Let us notice here that the second assertion is a consequence of the first: if /i is a 
1— Markov measure 

hiGi^) < hiiGn) < Zhiifi) = Zhifi) = hiGn). (3.14) 

Then hi{Gfi) = h{Gfi)] this imphes that Q/j, is a 1— Markov measure. 
This theorem can also be generahzed. 

Theorem 3.3 

hk{Qf^) < Zhkifi), (3.15) 
and Q maps k— Markov measures in k— Markov measures. 

4. The main result 

Theorem 13.21 asserts, roughly speaking, that the amount of information of G'(w), which 
is equal to that of w, is more concentrated on the pairs of symbols, with respect to the 
case of the original string w. This fact suggests that a sequence of pair substitutions can 
transfer all the information in the distributions of the pairs of symbols. To formalize 
this assertion, let us define recursively: 

the alphabets = U {a^} where ^ ^jv-n with Aq = A; 

the maps = G'^^^y^ : A*^^^ ^^K^ ^^^^e x^, G A^_^;^ 

the corresponding maps = Gx^y^, = Saf'', = Saf"; 

the measures /z^ = Gj^fij^_^, with /iq = /x; 

the normalization = Zx^y^ ; 



the composed maps 



Gj^ = Gj^ o ■ ■ ■ o Gi, = o ■ ■ ■ o 

= Sio ■ ■ ■ o S^, = Sio ■ ■ ■ o S^; 



the corresponding normalization Zj^ = Zj^Z^_^ ■ ■ ■ Z\ (when we need to specify the 
initial measure we will use the symbol Z^^\ 

In j3j the author chose at any step the pair of symbols with the maximum of the 
frequency of non-overlapping occurrences. This fact assures the divergence of Z ^ as we 
will prove using Theorem 13.21 

Theorem 4.1 // at any step N the pair x^y^ is the pair of maximum of frequency of 
non- overlapping occurrences between the pairs of symbols of A^ -^ then 

lim Z„ = +00 (4.16) 
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In this case the hypothesis of the following (main) theorem is satisfied. 
Theorem 4.2 // 



then 



lim Z„ = +00 (4.17) 

N-*+oo 

M/^)= lim ^ (4.18) 
Proof of Th. mU 

Let the maximum of probability /ij^.^ on the pair of symbols of ^jv-i- Ftoiu. the 
definition of Z., it follows that 



> z,^, (1 + ^) 



(the factor 2 appears for the case of substitution of two equal symbols). We can estimate 
Pjv with 

where if2(/^jv-i) — ~ J2ab€A f^N-ii^^) ^'^Sf^N-ii^^) the 2-block entropy. Using Th. 
Oand that Hi{fi^^J < \og{N -1 + \A\), with the cardinality of A: 

H2{fi^.,) = + ifi(/x^_J < Z^^Mf^) + ^og{N - 1 + \A\). 

Then 

> 1 + 



Z^_,- 2(iV-l + |A|) 

The sequence Z ^ is increasing; by absurd, if tends to a constant, from the previous 
equation Z ^jZ > 1 + c/(A^ — 1), but this implies Z ^ — > +cxd. 

Remark: this proof is also valid in the more general case we choose x^?/^ in such a 
way that 

where c is a constant independent on N . 

Proof of Th. ma 

For the composition S ^ it holds 

^^nK^X) = 'S'jv(Si) . . . S^{Sn), 

where S^{si) are words in the original alphabet A. Consider r G A*, \r\ = k and s a 
typical string for fi^. 
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{r C S,{s,) . ..S,{s^)} = E,eA, tl {r C SM}^ {9 ^ 



where '\l {r S j^{gi) . . . S ^^{gp)} is the number of occurrences of r in the string 
Sj^{gi) . . . S^{gp) which start in Sj^{gi) and end in Sj^{gp). We obtain 

= T;{^9^Aj{L^SM}f^A9) 
+ J2p=2 J2g,,...,g,eA^ H^^^Ni9i) ---S^ (gp) }fi^{gi... gp) 



(4.20) 



Let V be the projection operator that maps a measure /i to its 1-Markov 
approximation V^i and define n-'j^ = S.^-^ . . . Sj^Vfij^. In particular we have 7r° = Sj^VjJLj^ 
and vr^ = P/U^. It holds 

^l = Gy.- (4.21) 



In fact the measures tt^ and coincide on the pairs of symbols, then vr^ {w) = 
if w ^ G^{A*), as follows from Th. O Being G^{A*) C we can apply 

Proposition 12.3^ obtaining 

^:=^.C- (4-22) 

JV-l 

r 

JV 



Now, also vr^ and /U^ ^ coincide on the pairs of symbols (see Proposition 12. 2p . then 
we can iterate the procedure till to obtain Eq. 1)4.211) . Note that 

N N 

= 11(1 + = 11(1 + /^^■(«^)) =^'' (4.23) 

in fact vr-^ and /ij coincide on the pairs of symbols on Aj. Therefore for any k and any 
r of length k: 

K{r)-ii{r)\< 

< ^ Ep=3 E3i,...g,gA^ [lJ^N+^N)i9i---9p)H^'^^Ni9i)---S^igp)} (4.24) 



<2|i 



which tends to when N +oo. This implies that 



lim hkin^ ) = huiu). 



In conclusion, for any k 



^ N 
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We stress that the third step of the previous chain follows from the definition vr^ = VfXj^ 
and that the fourth step follows from ()4.21|) and ()4.23p . 
Taking the limits +00 and k +00: 

hi^^)= hm H^. 

5. Some examples 

We consider here a given sequence of pair substitutions which is not obtained with the 
procedure of the minimization of the length of the new strings, as prescribed in the 
NSRPS method. 

The initial alphabet is A = {0, 1}. The first pair substitution is 10 — > 2, the second 
20 — >■ 3; in general the iV-th substitution is A^O N + 1. Notice that the infinite 
composition of these substitutions corresponds to the coding procedure that substitute 
maximal blocks of k consecutive zeros, and the one that precedes them, with the new 
symbol k + 1. 

If the initial measure gives positive probability to the pair 11, then the normalization 
can not diverge, namely for an initial (typical) string of length L, after the 
transformations there remain at most /i(ll)L symbols. 

Let us notice that only the first substitution involves the symbol one, then it is easy 
to do the following computations: 

,„ain) ^..(iin)^ '^;;;; dim). 

If for the initial measure /i(l|lll) 7^ yu(l|ll), then yU^(l|ll) 7^ yU^(l|l) for any 
and hi{fi^)/ Z ^ can not converge to (the limiting process can not be a 1-Markov 
process) . 

On the other hand we can consider as initial measure a finite mean renewal process, 
that is a stationary process for which the distances between consecutive ones are i.i.d. 
random variables with distribution {pk}k>i = Yl^iJPj < The entropy of 

such a process is 

An explicit computation of the marginals of fi^ is not difficult. It follows that 



where E = Ylf=ijPj ^^"^ 




Pi + . . . + pn+i j = 1 
J> 1 
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Note that if we consider the measures /i^ as measures in the alphabet N, then /i^ 
weakly converges to the product measure with marginals {Pk}k>i- Fiom this (or by 
direct computation) we can derive 

hm = hm = h{^^) 

Let us stress that in this case the process becomes independent, then also 
converges to the entropy. This fact is a consequence of the very particular choice 
of the initial measure. If the distances between consecutive ones are not distributed 
independently, but, for instance, with a two step Markov process, then hi{fi^)/ Z ^ and 
Hi{jij^) / Z do not converge to the entropy. 

6. Concluding remarks 

The main result proved here says that under the action of the NSRPS procedure any 
ergodic process becomes asymptotically Markov, i.e. hi{fij^)/ Z ^ ^ h. A natural 
question is when the process becomes even independent, i.e. Hi{fij^)/ Z — > h, as 
for the very specific example discussed in section 5. In our opinion this is a non trivial 
question, presumably depending on the behavior of the number of forbidden sequences 
in the iterated measures. 

The results of this paper imply also the fact that a NSRPS algorithm can be used 
to estimate the entropy of an ergodic source starting from a sequence of sufficiently 
large length, say L. This is done iterating N{L) pair substitutions with N{L) diverging 
with L sufficiently slow, and then computing the conditional entropy hi of the empirical 
measure of the resulting sequence. An interesting question is how fast N{L) can diverge 
with L. 

Analogously it is possible to define an asymptotically optimal compression 
algorithm based on NSRPS: iterating a suitable number of times the pair substitution 
procedure we ends up with an approximatively Markov sequence; this sequence can be 
compressed by an algorithm which take into account only the pair correlations (like for 
instance a suitable arithmetic coder). As before, if the number of substitutions diverges 
with L sufficiently slow then the compression rate converges to h. 

In practice, given a sequence of length L, it is not so obvious to decide in an efficient 
way what is the optimal number of substitution to make. This point is discussed a little 
bit in 4^ and we do not enter in it. 

7. Technical results on measures transformations 

7. 1 . Proof of Theorem 

We do not give a formal proof of the theorem but just a sketch of it (more details are 
in the analogous proof for Proposition 12.11 in the next subsection). The fact that the 
limits are almost surely constants can be deduced from the strong law of large numbers. 
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This fact implies the ergodicity of Qn and Su (see theorem 1.4.2 pag. 44 of jH]). The 
compatibihty conditions for the famihes of marginals are easily checked. Formula ()2.6p 
is consequence of fl2.Hj) . 

7. 2. Proof of Proposition \2.1\ 
In the case x y, we have 

\G{w^)\=n-U^y^<} 

so that 

n 1 



n 

and the result (|2.7|) follows from the strong law of large numbers. 
In the case x = y we have that 



fc=2 



where [ ] is the integer part and jl {*x^* C is the number of blocks of exact length 
k of consecutive x contained in {* represent a possible occurrence of a generic letter 
different from x). It holds 

Now we have 

n 1 



fc=2v 



that converges to the right hand side of (j2.8|) for any ergodic measure /z different from 
the measure concentrated on the sequence of all x (in this case clearly Z = 2). 
Formula ()2.9|1 follows from 

S{z^) = n + tl {« ^ zn 

and the strong law of large numbers. 

Formula ()2.10|) can be deduced from (j2 



7.3. Sv in terms of v 

We consider the substitution a — >• xy. We have that 

W = lim - — = lim > ulz)- — 

n— »+oo n n^+oo ^ 72 

\z\=n 



(7.26) 
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Oiy[r) .- lim„^+oo -jsiz^jT ~ ^™"-^+°° wTl ^7 25) 

= limn^+oow^J2\z\=n^i^ML^ S{z'^)}. 

Suppose now that |r| = k and consider, for n > k, 

Dn = Elz\=n ^Unit C S{Z)} - Elz\=n-1 '^i^L ^ S{z)} 

= E|,|=„ {r = S{z)>i) + Haz)l (r = yS{z)1-') • 

We can rewrite these terms as 

E|,|=„ {r = Siz)',) = E,:5(,)=r ('^(^) + (r. = x)) 

E|,|=„-i (r = ySUr,-') = (7.27) 

= ^,:Sis>=r_ (K«4'') + = x)) l{r, = y). 

Hence is constant for n > k and 

= (7.28) 



|z|=n 

Collecting ()7.25|) - ()7.28p we obtain the expression for Su: 

Si'ir) = ^ Es;S{s)=r ('^U) + '^(4-'~'tt)I('^fc = a;)+ ^^^^^ 
z/(as!f')I(ri = y) + z/(a4-'"^a)I(ri = ?/)I(rfc = x)j 

7.-^^. ^/i m terms of fi 

The map S* inverts G, then in order to find the expression of Qfi we can invert the 
expression of SQfi = /i. Let z/ be Qfi. The sum on s in Eq. ()7.29|) reduces to s = G(r), 
namely z/(s) = if s ^ (^(A*). This reduction make explicitly invertible Eq. ()7.29p . but 
we have to distinguish the two cases x ^ y and x = y. 

Case X y. 

Let r G v4* and let z, w E Ahe such that z ^ x and w ^ y. From ()7.29|) we obtain: 

Wfj,{wrz) = piGiwr^z)) 

WiJi{wrx) = v{G{wr)x) + v{G{wr)a) . , 

Wn{yrz) = v{yG{rz)) + v{aG{rz)) ^ ' ' 

Wfj.{yrx) = y{yG{r)x) + v{aG{r)x) + i'{yG{r)a) + v{aG{r)a) 

Let now s_ = G{r) with \s\ = n and |r| = k. The expression of z/(s) = Qfi{s) can be 
calculated from the previous equations, obtaining 

si^ y, SnT^ X : z/(s) = Wfi{r) 

si = y, Sn^ x: u{s) = W{n{r) - /i(a;?/rf)) 

si = y, Sn = x: u{s) = W{fi{r) + fi{xyr2^^xy) - fiixyr^) - fJ.{r'['^xy)) 



Now we can calculate Z = W (see Eq. fl2.1Up ) in terms of fi: 

Z = 1 + u{a) = 1 + Zfi{xy) = — -. 

1 - fi{xy) 
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We remark that eq.s (j7.3ip can be synthesized in 

gii{s) = Z ii{asb). (7.32) 

a,b&A: asb<^G(A*) 

Case X = y. 

Proceeding as above we obtain again the exphcit expressions for z/(s) but they are 
more comphcated. As before let s G G{A*), \s\ = n > 0, G{r) = s, \r\ = k. Let 
Si, Sn 7^ X. Denoting with gP the string of p times the symbol a, the strings in G{A*) 
are of the type 

c^x^ SQ^xf and a^x'", with p, g > and tt, a = 0, 1. 

The expression oi = v in terms of /i is given by: 

T^{§.Q^) = Zfi{rx^'^) for q > 

u{saix) = Z(/i(r - fi{rx^''+^))) for g > 

iy{aP) = Z ES(-l)^'/i(^^^+-'')) for p > 1 

iy{aPx) = Z{fi{x^P+^) -2^+^^{-iyij{x^P+^)) for p > 1 (7.33) 

uiaPxJ'sa") = Z ES(-l)^/"fe^^^^^^Z:^^'') for p + vr > 1, g > 

uiaPx^'sa^x) =^ES(~^)^' 

(//(x^P+^+^r x^^+i) - /i(x2p+^+%x2'?+2)) for p + vr > 1, g > 

Now we can calculate Z in terms of fi: 

Z = l + u{a) = 1 + Z 5^(-l)>(x2+^) 



1 - Em-iVK^^] 



7. 5. Proof of Proposition \K 



This proposition is a consequence of Eq. (j7.29|l in subsection 17. 3| namely |s| < r if 
S{s) = r. 

7. 6. Proof of Proposition \2.Si 

This proposition is a consequence of the fact that the explicit expression (j7.29|) of ^ = Su 
in terms of u can be inverted (in an unique way) if u respects the pair constraints given 
by G, as follows from eq.s ()7.3()p - ()7.33p in subsection 17.41 The expression of z/ in terms 
of fi is exactly Qfi, then u = Qfi = QSv. 



8. Technical results on entropies transformations 

8. 1 . Proof of theorem \S. 1\ 

The result follows from the fact that G is a faithful code and is a faithful code when 
restricted to the support of Qjji. We call G := {Gn}^^ ^ sequence of universal codes 
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in the alphabet A and C := {C^}^^^ a sequence of universal codes in the alphabet A' 
(see theorems II. 1.1 and II. 1.2 page 122 of 

We have that C" o G is a sequence of faithful codes in A. From this we deduce that 
on a set of /i measure one 

Likewise we have that C o 5 is a sequence of faithful codes in A'. From this we 
deduce that on a set of fi measure one 

ft(^) = ,„n 9^ = „„, |GW)|C„o S(GK)) ^ MM 
8.2. Proof of theorems El and\3.'d\ 

We proceed splitting the action of G (and then of ^) in three parts, introducing two 

new character 6i, 62 ^ A. 

Given a string, we operate as follows: 

Step 1 We substitute, starting form the left, any occurrence of xy with xhi. This 
operation define a map R : A* ^ A*^, where Ar = AU {bi}. We call 71 the 
corresponding map for the measures, defined in the same spirit of Th. 12.21 

Step 2 We substitute any occurrence of xbi with 62^1- This operation define a map 
L : A*^ A^, where Al = Ar U {&2}- We call C the corresponding map for 
the measures. 

Step 3 We substitute any occurrence of with a. This operation, in general, define 
a map C : A*^ -.^ A^, where Ac = AlU {a}. We call C the corresponding map 
for the measures. 



From these definitions: 



C{L{R{w))) = G{w), and then C£7^/i = Qfi. 

With this splitting we separate the effects of the shortening of the strings (step 3) from 
the effect of the partial substitutions of characters (steps 1,2). 

Lemma 8.1 

hiiUfi) < /ii(/i) (8.34) 

(the proof is in subsection 18. 3|) . 

The same assertion holds for LTZ^. Namely we can define L also considering the 
substitutions starting from the right, namely x 7^ 61. In this way L{w) = (R'i'uf))^, 
where vf = {wi . . . WkY = Wk ■ ■ -Wi and R' is the substitution, from the left, of bix with 
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6162- The map R' acts in the same way of R, then lemma IHTTl holds for the corresponding 
map for the measures TZ', and then also for C. In this way we prove that 

/^l(£7^/i) < 

The third step preserves hi up to the normalization, as stated in the following 
lemma (proved in subsection 18 .41) 

Lemma 8.2 // p € S{Al) verifies 

p{b2w) = p{zhi) = for w ^hi, z ^ 62, (8.35) 

then 

hi{Cp) = Whiip), (8.36) 

where 

'^'' = r3^ = l+Cp(a). (8.37) 

We achieve the proof of theorem 13.131 observing that the measure p = CTZp verifies the 
constraints ()8.35|) . then hi{Qp) < Whi{p), where W = Z because W = 1 + Cp{a) = 
1 + Gp{a) = W^f" = Z^y (see Eq. (^Un^ ). 

We conclude this section remarking that Lemma f8. II holds also for h^, and that for 
hk it holds the following analogous of Lemma 18. 2t proved in subsection 18.51 

Lemma 8.3 Under the hypotheses of lemma W^ 



hkiCp) < Whkip) 
From these facts it follows Theorem 13.31 

8. 3. Proof of Lemma \8.1\ 

Let ^ = TZp. The measure p can be expressed in terms of ^ as follows: 

z: R(z)=w 

We use this formula to express the probabilities of the symbols and of the pairs of 
symbols. 

Case X ^ y. Let p be in A: 

Ky) =^iy)+^ibi), p{p)=^{p) for p^y, 

i^iyp) = iiyp) + iipip), p{.pq) = i{.pq) iorp^x,p^y, 

K^y) = ^(xbi), p{xp) = ^{xp) for p^y. 

By direct calculation: 

hip) - hio = - E,a^iyp) + e(M) log ^-^^ 



+ E,eA {^yp) log '-^ + abip) log fg^ 



(8.38) 
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We prove the lemma showing that: 

ayp) log fg^ + ahp) log fg) > {^yp) + ahp)) log 

Dividing for ^{yp) + ^{bip) and setting /? = 7 = g( > this inequahty can 

be rewritten as: 

7log^ + (l-7) logi^<0, 
7 1-7 

which is always verified. 
Case X = y. Let p & A, p ^ x: 

fi{x) = ^{x) + = ^(p), 

fi{pq) = iipq) for g G A, = ^(pa;). 

The difference between the 1-conditional entropies is 

hM - hio = -E,.ap^.(^M + ^(M)iogfg±i# 

- i^ixbi) + abix)) log ^^""^^^^^"^"^ 



+e(a:Mlog^ + e(Mlogf^ 



(8.39) 



We prove that this difference is positive with the same argument used for the case x ^ y. 
Finally, we remark that in the same way we can prove that hk{^) < hkin). 

8.4- Proof of Lemma iS.^ 

Let V = Cp and W = 1 + v{a). It is easy to write p in terms of v. Let p,q ^ bi,b2- The 
probabilities of the symbols and of the pairs of symbols are given by 

Wp{bi) = Wp{b2) = iy{a) Wp{p) = v{p) 

Wp{pbi) = Wp{b2q) = Wp{pq) = i^ipq) 

Wp{pb2) = ^{pa) Wp{biq) = u^aq) 

Wp{b2bi) = iy{a) Wp{bib2) = z/(aa) 

By explicit calculation: 

-"UPJ — Z^«GAr.\a W W W ^'^^ W ~ W ^ W ^Og 



H ( n\ = _ V '"(Pg) Incr '^(P'?) _ '^(") Incr '^(") = ^'^^^^ _|_ ^°g^ _ '^(") Incr '^(°) 

Then 



h,{p) = H2ip)-H,ip) = ^. 
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8. 5. Proof of Lemma \8/J[ 

We need some definitions. Let w_ = w{. We can identify w_ with the cyhndrical subset 
Kw C defined as follows 

K^ = A^ : x-i = wi,x-i+i = W2, . . . ,x-i = wi] 

Let P C A* be a finite set. We say that P is a partition if 



{K^}^(zp is a partition of A^, i.e. 



(1) K^nK, = (D iiw^z, 

(2) [j^,pK^ = A\ 



Condition (1) says that any string of P is not a suffix for others strings of P. If only 
condition (1) is verified, we say that P is a semi-partition. It is easy to show that any 
semi-partition can be completed to obtain a partition. Moreover, if the minimum of the 
length of the strings in P is /, we can complete P using strings of length greater or equal 
to /. 

If P is a partition, we can define the P-conditional entropy as 

hp{n) = - Y] fi{wa) log ^^7'"'} ■ 
weP,aeA f*v— / 

If P and Q are two partition we say that P is more fine of Q if any string of P ends 
with a string of Q. If P is more fine then Q: 

hp{l2) < hgifi). (8.40) 

(The proof is at the end of this section). 
Note that 

P = {seAl\\C{s)\ = k}, 
is a semi-partition, and that, from direct calculation 

hk{iy) = Whp{p). 

Where we remember that u = Cp. In particular we have used that, if s G A}^, 
p{^2) = p(s&2^i) and if the last symbol of s_ differs from 62 then p{sbi) = 0. 
Finally let P be a completion of P. 

hk{iy) = Whp{p) < Whpip). 

The length of the strings in P is greater or equal to k and we construct P so that the 
same holds for P. Therefore, A'l is a partition less fine of P. Invoking Eq. ()8.40j) we 
conclude that 

hk{iy)<Wh^.^{p) = Whk{p). 

Proof of Eq. (jOnjl . 
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Let w_E Q and C P be the subset of the strings which end with w. From this 
definition: 

The function = xlogx is convex, then if Aj > and "^Xi = 1, ^{"^XiXi) < 

Y,XiXi\ogXi. Now 

-hgifi) = E^i-eg/^M EagA/^(«k) ^og fi{a\w), 

and 

Indicating with = fi{ra) / fi{r) , and with = fJ'{r)/^{w) and noting that Erex™ V ~ 
1, we obtain: 

-hqifi) = EweQ EaeA f^M'^ {^rex^ ^rX^) 

< E^.QE,.x^E..A/^M^S/^(a|z:)log/i(a|r) (8.41) 
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